Search CORE

Identifying gene regulatory modules of heat shock response in yeast

Author: Li Wen-Hsiung
Wu Wei-Sheng
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Different evolutionary patterns between young duplicate genes in the human genome

Author: Gu Zhenglong
Li Wen-Hsiung
Zhang Peng
Publication venue: BioMed Central
Publication date: 01/09/2003
Field of study

BACKGROUND: Following gene duplication, two duplicate genes may experience relaxed functional constraints or acquire different mutations, and may also diverge in function. Whether the two copies will evolve in different patterns remains unclear, however, because previous studies have reached conflicting conclusions. In order to resolve this issue, by providing a general picture, we studied 250 independent pairs of young duplicate genes from the whole human genome. RESULTS: We showed that nearly 60% of the young duplicate gene pairs have evolved at the amino-acid level at significantly different rates from each other. More than 25% of these gene pairs also showed significantly different ratios of nonsynonymous to synonymous rates (K(a)/K(s )ratios). Moreover, duplicate pairs with different rates of amino-acid substitution also tend to differ in the K(a)/K(s )ratio, with the fast-evolving copy tending to have a slightly higher K(s )than the slow-evolving one. Lastly, a substantial portion of fast-evolving copies have accumulated amino-acid substitutions evenly across the protein sequences, whereas most of the slow-evolving copies exhibit uneven substitution patterns. CONCLUSIONS: Our results suggest that duplicate genes tend to evolve in different patterns following the duplication event. One copy evolves faster than the other and accumulates amino-acid substitutions evenly across the sequence, whereas the other copy evolves more slowly and accumulates amino-acid substitutions unevenly across the sequence. Such different evolutionary patterns may be largely due to different functional constraints on the two copies

Springer

arXiv.org e-Print Archive

Shape restricted regression with random Bernstein polynomials

Author: Chang I-Shou
Chien Li-Chu
Hsiung Chao A.
Wen Chi-Chung
Wu Yuh-Jenn
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

Shape restricted regressions, including isotonic regression and concave regression as special cases, are studied using priors on Bernstein polynomials and Markov chain Monte Carlo methods. These priors have large supports, select only smooth functions, can easily incorporate geometric information into the prior, and can be generated without computational difficulty. Algorithms generating priors and posteriors are proposed, and simulation studies are conducted to illustrate the performance of this approach. Comparisons with the density-regression method of Dette et al. (2006) are included.Comment: Published at http://dx.doi.org/10.1214/074921707000000157 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

Overlapping genes in the human and mouse genomes

Author: Li Wen-Hsiung
Sanna Chaitanya R
Zhang Liqing
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Increasing evidence suggests that overlapping genes are much more common in eukaryotic genomes than previously thought. In this study we identified and characterized the overlapping genes in a set of 13,484 pairs of human-mouse orthologous genes. Results About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps. The majority of the same-strand overlaps are embedded forms, whereas most different-strand overlaps are not embedded and in the convergent transcription orientation. Most of the same-strand overlapping gene pairs show at least a tenfold difference in length, much larger than the length difference between non-overlapping neighboring gene pairs. The length difference between the two different-strand overlapping genes is less dramatic. Over 27% of the different-strand-overlap relationships are shared between human and mouse, compared to only ~8% conservation for same-strand-overlap relationships. More than 96% of the same-strand and different-strand overlaps that are not shared between human and mouse have both genes located on the same chromosomes in the species that does not show the overlap. We examined the causes of transition between the overlapping and non-overlapping states in the two species and found that 3' UTR change plays an important role in the transition. Conclusion Our study contributes to the understanding of the evolutionary transition between overlapping genes and non-overlapping genes and demonstrates the high rates of evolutionary changes in the un-translated regions.</p

Multidimensional scaling for large genomic data sets

Author: Li Wen-Hsiung
Lu Henry Horng-Shing
Tzeng Jengnan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Multi-dimensional scaling (MDS) is aimed to represent high dimensional data in a low dimensional space with preservation of the similarities between data points. This reduction in dimensionality is crucial for analyzing and revealing the genuine structure hidden in the data. For noisy data, dimension reduction can effectively reduce the effect of noise on the embedded structure. For large data set, dimension reduction can effectively reduce information retrieval complexity. Thus, MDS techniques are used in many applications of data mining and gene network research. However, although there have been a number of studies that applied MDS techniques to genomics research, the number of analyzed data points was restricted by the high computational complexity of MDS. In general, a non-metric MDS method is faster than a metric MDS, but it does not preserve the true relationships. The computational complexity of most metric MDS methods is over <it>O(N</it>2<it>)</it>, so that it is difficult to process a data set of a large number of genes <it>N</it>, such as in the case of whole genome microarray data. Results We developed a new rapid metric MDS method with a low computational complexity, making metric MDS applicable for large data sets. Computer simulation showed that the new method of split-and-combine MDS (SC-MDS) is fast, accurate and efficient. Our empirical studies using microarray data on the yeast cell cycle showed that the performance of K-means in the reduced dimensional space is similar to or slightly better than that of K-means in the original space, but about three times faster to obtain the clustering results. Our clustering results using SC-MDS are more stable than those in the original space. Hence, the proposed SC-MDS is useful for analyzing whole genome data. Conclusion Our new method reduces the computational complexity from <it>O</it>(<it>N</it>3) to <it>O</it>(<it>N</it>) when the dimension of the feature space is far less than the number of genes <it>N</it>, and it successfully reconstructs the low dimensional representation as does the classical MDS. Its performance depends on the grouping method and the minimal number of the intersection points between groups. Feasible methods for grouping methods are suggested; each group must contain both neighboring and far apart data points. Our method can represent high dimensional large data set in a low dimensional space not only efficiently but also effectively.</p

Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle

Author: Chen Bor-Sen
Li Wen-Hsiung
Wu Wei-Sheng
Publication venue: BioMed Central
Publication date: 01/09/2006
Field of study

BACKGROUND: A transcriptional regulatory module (TRM) is a set of genes that is regulated by a common set of transcription factors (TFs). By organizing the genome into TRMs, a living cell can coordinate the activities of many genes and carry out complex functions. Therefore, identifying TRMs is helpful for understanding gene regulation. RESULTS: Integrating gene expression and ChIP-chip data, we develop a method, called MOdule Finding Algorithm (MOFA), for reconstructing TRMs of the yeast cell cycle. MOFA identified 87 TRMs, which together contain 336 distinct genes regulated by 40 TFs. Using various kinds of data, we validated the biological relevance of the identified TRMs. Our analysis shows that different combinations of a fairly small number of TFs are responsible for regulating a large number of genes involved in different cell cycle phases and that there may exist crosstalk between the cell cycle and other cellular processes. MOFA is capable of finding many novel TF-target gene relationships and can determine whether a TF is an activator or/and a repressor. Finally, MOFA refines some clusters proposed by previous studies and provides a better understanding of how the complex expression program of the cell cycle is regulated. CONCLUSION: MOFA was developed to reconstruct TRMs of the yeast cell cycle. Many of these TRMs are in agreement with previous studies. Further, MOFA inferred many interesting modules and novel TF combinations. We believe that computational analysis of multiple types of data will be a powerful approach to studying complex biological systems when more and more genomic resources such as genome-wide protein activity data and protein-protein interaction data become available

National Tsing Hua University Institutional Repository

CpG island density and its correlations with genomic features in mammalian genomes

Author: Han Leng
Li Wen-Hsiung
Su Bing
Zhao Zhongming
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

A systematic analysis of CpG islands in ten mammalian genomes suggests that an increase in chromosome number elevates GC content and prevents loss of CpG islands

VCU Scholars Compass

Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data

Author: Chen Bor-Sen
Li Wen-Hsiung
Wu Wei-Sheng
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background ChIP-chip data, which indicate binding of transcription factors (TFs) to DNA regions in vivo, are widely used to reconstruct transcriptional regulatory networks. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop methods to identify regulatory targets of TFs from ChIP-chip data. Results We developed a method, called Temporal Relationship Identification Algorithm (TRIA), which uses gene expression data to identify a TF's regulatory targets among its binding targets inferred from ChIP-chip data. We applied TRIA to yeast cell cycle microarray data and identified many plausible regulatory targets of cell cycle TFs. We validated our predictions by checking the enrichments for functional annotation and known cell cycle genes. Moreover, we showed that TRIA performs better than two published methods (MA-Network and MFA). It is known that co-regulated genes may not be co-expressed. TRIA has the ability to identify subsets of highly co-expressed genes among the regulatory targets of a TF. Different functional roles are found for different subsets, indicating the diverse functions a TF could have. Finally, for a control, we showed that TRIA also performs well for cell-cycle irrelevant TFs. Conclusion Finding the regulatory targets of TFs is important for understanding how cells change their transcription program to adapt to environmental stimuli. Our algorithm TRIA is helpful for achieving this purpose.</p

National Tsing Hua University Institutional Repository